Overview

Dataset statistics

Number of variables22
Number of observations111923
Missing cells11724
Missing cells (%)0.5%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory18.8 MiB
Average record size in memory176.0 B

Variable types

Categorical11
DateTime6
Numeric5

Alerts

order_id has a high cardinality: 98115 distinct values High cardinality
customer_id has a high cardinality: 98115 distinct values High cardinality
product_id has a high cardinality: 32486 distinct values High cardinality
seller_id has a high cardinality: 3050 distinct values High cardinality
product_category_name has a high cardinality: 73 distinct values High cardinality
customer_unique_id has a high cardinality: 94855 distinct values High cardinality
customer_city has a high cardinality: 4111 distinct values High cardinality
mes_compra is highly correlated with ano_compraHigh correlation
ano_compra is highly correlated with mes_compraHigh correlation
mes_compra is highly correlated with ano_compraHigh correlation
ano_compra is highly correlated with mes_compraHigh correlation
ano_compra is highly correlated with ano_mesHigh correlation
ano_mes is highly correlated with ano_compraHigh correlation
mes_compra is highly correlated with ano_compra and 1 other fieldsHigh correlation
ano_compra is highly correlated with mes_compra and 1 other fieldsHigh correlation
ano_mes is highly correlated with mes_compra and 1 other fieldsHigh correlation
order_delivered_carrier_date has 1873 (1.7%) missing values Missing
order_delivered_customer_date has 3119 (2.8%) missing values Missing
product_category_name has 2315 (2.1%) missing values Missing
order_id is uniformly distributed Uniform
customer_id is uniformly distributed Uniform
customer_unique_id is uniformly distributed Uniform

Reproduction

Analysis started2022-09-01 17:39:51.921877
Analysis finished2022-09-01 17:40:33.028876
Duration41.11 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

order_id
Categorical

HIGH CARDINALITY
UNIFORM

Distinct98115
Distinct (%)87.7%
Missing0
Missing (%)0.0%
Memory size874.5 KiB
8272b63d03f5f79c56e9e4120aec44ef
 
21
ab14fdcfbe524636d65ee38360e22ce8
 
20
1b15974a0141d54e36626dca3fdc731a
 
20
428a2f660dc84138d969ccd69a0ab6d5
 
15
9ef13efd6949e4573a18964dd1bbe7f5
 
15
Other values (98110)
111832 

Length

Max length32
Median length32
Mean length32
Min length32

Characters and Unicode

Total characters3581536
Distinct characters16
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique88438 ?
Unique (%)79.0%

Sample

1st rowe481f51cbdc54678b7cc49136f2d6af7
2nd row53cdb2fc8bc7dce0b6741e2150273451
3rd row47770eb9100c2d0c44946d9cf07ec65d
4th row949d5b44dbf5de918fe9c16f97b45f8a
5th rowad21c59c0840e6cb83a9ceb5573f8159

Common Values

ValueCountFrequency (%)
8272b63d03f5f79c56e9e4120aec44ef21
 
< 0.1%
ab14fdcfbe524636d65ee38360e22ce820
 
< 0.1%
1b15974a0141d54e36626dca3fdc731a20
 
< 0.1%
428a2f660dc84138d969ccd69a0ab6d515
 
< 0.1%
9ef13efd6949e4573a18964dd1bbe7f515
 
< 0.1%
9bdc4d4c71aa1de4606060929dee888c14
 
< 0.1%
73c8ab38f07dc94389065f7eba4f297a14
 
< 0.1%
37ee401157a3a0b28c9c6d0ed8c3b24b13
 
< 0.1%
c05d6a79e55da72ca780ce90364abed912
 
< 0.1%
af822dacd6f5cff7376413c03a388bb712
 
< 0.1%
Other values (98105)111767
99.9%

Length

2022-09-01T14:40:33.185895image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
8272b63d03f5f79c56e9e4120aec44ef21
 
< 0.1%
1b15974a0141d54e36626dca3fdc731a20
 
< 0.1%
ab14fdcfbe524636d65ee38360e22ce820
 
< 0.1%
428a2f660dc84138d969ccd69a0ab6d515
 
< 0.1%
9ef13efd6949e4573a18964dd1bbe7f515
 
< 0.1%
9bdc4d4c71aa1de4606060929dee888c14
 
< 0.1%
73c8ab38f07dc94389065f7eba4f297a14
 
< 0.1%
37ee401157a3a0b28c9c6d0ed8c3b24b13
 
< 0.1%
2c2a19b5703863c908512d135aa6accc12
 
< 0.1%
637617b3ffe9e2f7a2411243829226d012
 
< 0.1%
Other values (98105)111767
99.9%

Most occurring characters

ValueCountFrequency (%)
4225109
 
6.3%
e224538
 
6.3%
6224490
 
6.3%
3224365
 
6.3%
b224345
 
6.3%
7224298
 
6.3%
a224031
 
6.3%
2223936
 
6.3%
1223771
 
6.2%
8223741
 
6.2%
Other values (6)1338912
37.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2238464
62.5%
Lowercase Letter1343072
37.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
4225109
10.1%
6224490
10.0%
3224365
10.0%
7224298
10.0%
2223936
10.0%
1223771
10.0%
8223741
10.0%
9223318
10.0%
0223076
10.0%
5222360
9.9%
Lowercase Letter
ValueCountFrequency (%)
e224538
16.7%
b224345
16.7%
a224031
16.7%
c223680
16.7%
f223608
16.6%
d222870
16.6%

Most occurring scripts

ValueCountFrequency (%)
Common2238464
62.5%
Latin1343072
37.5%

Most frequent character per script

Common
ValueCountFrequency (%)
4225109
10.1%
6224490
10.0%
3224365
10.0%
7224298
10.0%
2223936
10.0%
1223771
10.0%
8223741
10.0%
9223318
10.0%
0223076
10.0%
5222360
9.9%
Latin
ValueCountFrequency (%)
e224538
16.7%
b224345
16.7%
a224031
16.7%
c223680
16.7%
f223608
16.6%
d222870
16.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII3581536
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
4225109
 
6.3%
e224538
 
6.3%
6224490
 
6.3%
3224365
 
6.3%
b224345
 
6.3%
7224298
 
6.3%
a224031
 
6.3%
2223936
 
6.3%
1223771
 
6.2%
8223741
 
6.2%
Other values (6)1338912
37.4%

customer_id
Categorical

HIGH CARDINALITY
UNIFORM

Distinct98115
Distinct (%)87.7%
Missing0
Missing (%)0.0%
Memory size874.5 KiB
fc3d1daec319d62d49bfb5e1f83123e9
 
21
bd5d39761aa56689a265d95d8d32b8be
 
20
be1b70680b9f9694d8c70f41fa3dc92b
 
20
10de381f8a8d23fff822753305f71cae
 
15
adb32467ecc74b53576d9d13a5a55891
 
15
Other values (98110)
111832 

Length

Max length32
Median length32
Mean length32
Min length32

Characters and Unicode

Total characters3581536
Distinct characters16
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique88438 ?
Unique (%)79.0%

Sample

1st row9ef432eb6251297304e76186b10a928d
2nd rowb0830fb4747a6c6d20dea0b8c802d7ef
3rd row41ce2a54c0b03bf3443c3d931a367089
4th rowf88197465ea7920adcdbec7375364d82
5th row8ab97904e6daea8866dbdbc4fb7aad2c

Common Values

ValueCountFrequency (%)
fc3d1daec319d62d49bfb5e1f83123e921
 
< 0.1%
bd5d39761aa56689a265d95d8d32b8be20
 
< 0.1%
be1b70680b9f9694d8c70f41fa3dc92b20
 
< 0.1%
10de381f8a8d23fff822753305f71cae15
 
< 0.1%
adb32467ecc74b53576d9d13a5a5589115
 
< 0.1%
a7693fba2ff9583c78751f2b66ecab9d14
 
< 0.1%
d5f2b3f597c7ccafbb5cac0bcc3d602414
 
< 0.1%
7d321bd4e8ba1caf74c4c1aabd9ae52413
 
< 0.1%
3b54b5978e9ace64a63f90d176ffb15812
 
< 0.1%
9eb3d566e87289dcb0acf28e1407c83912
 
< 0.1%
Other values (98105)111767
99.9%

Length

2022-09-01T14:40:33.444809image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
fc3d1daec319d62d49bfb5e1f83123e921
 
< 0.1%
be1b70680b9f9694d8c70f41fa3dc92b20
 
< 0.1%
bd5d39761aa56689a265d95d8d32b8be20
 
< 0.1%
10de381f8a8d23fff822753305f71cae15
 
< 0.1%
adb32467ecc74b53576d9d13a5a5589115
 
< 0.1%
a7693fba2ff9583c78751f2b66ecab9d14
 
< 0.1%
d5f2b3f597c7ccafbb5cac0bcc3d602414
 
< 0.1%
7d321bd4e8ba1caf74c4c1aabd9ae52413
 
< 0.1%
0d93f21f3e8543a9d0d8ece01561f5b212
 
< 0.1%
daf15f1b940cc6a72ba558f093dc00dd12
 
< 0.1%
Other values (98105)111767
99.9%

Most occurring characters

ValueCountFrequency (%)
f224619
 
6.3%
5224352
 
6.3%
c224326
 
6.3%
1224233
 
6.3%
8224174
 
6.3%
6224111
 
6.3%
2223960
 
6.3%
7223843
 
6.2%
a223829
 
6.2%
3223825
 
6.2%
Other values (6)1340264
37.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2238036
62.5%
Lowercase Letter1343500
37.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
5224352
10.0%
1224233
10.0%
8224174
10.0%
6224111
10.0%
2223960
10.0%
7223843
10.0%
3223825
10.0%
9223530
10.0%
4223011
10.0%
0222997
10.0%
Lowercase Letter
ValueCountFrequency (%)
f224619
16.7%
c224326
16.7%
a223829
16.7%
b223678
16.6%
e223530
16.6%
d223518
16.6%

Most occurring scripts

ValueCountFrequency (%)
Common2238036
62.5%
Latin1343500
37.5%

Most frequent character per script

Common
ValueCountFrequency (%)
5224352
10.0%
1224233
10.0%
8224174
10.0%
6224111
10.0%
2223960
10.0%
7223843
10.0%
3223825
10.0%
9223530
10.0%
4223011
10.0%
0222997
10.0%
Latin
ValueCountFrequency (%)
f224619
16.7%
c224326
16.7%
a223829
16.7%
b223678
16.6%
e223530
16.6%
d223518
16.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII3581536
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
f224619
 
6.3%
5224352
 
6.3%
c224326
 
6.3%
1224233
 
6.3%
8224174
 
6.3%
6224111
 
6.3%
2223960
 
6.3%
7223843
 
6.2%
a223829
 
6.2%
3223825
 
6.2%
Other values (6)1340264
37.4%

order_status
Categorical

Distinct8
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size874.5 KiB
delivered
108811 
shipped
 
1171
canceled
 
637
unavailable
 
601
processing
 
355
Other values (3)
 
348

Length

Max length11
Median length9
Mean length8.984140883
Min length7

Characters and Unicode

Total characters1005532
Distinct characters17
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowdelivered
2nd rowdelivered
3rd rowdelivered
4th rowdelivered
5th rowdelivered

Common Values

ValueCountFrequency (%)
delivered108811
97.2%
shipped1171
 
1.0%
canceled637
 
0.6%
unavailable601
 
0.5%
processing355
 
0.3%
invoiced340
 
0.3%
created5
 
< 0.1%
approved3
 
< 0.1%

Length

2022-09-01T14:40:33.700807image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-09-01T14:40:34.015042image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
delivered108811
97.2%
shipped1171
 
1.0%
canceled637
 
0.6%
unavailable601
 
0.5%
processing355
 
0.3%
invoiced340
 
0.3%
created5
 
< 0.1%
approved3
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
e330187
32.8%
d219778
21.9%
i111618
 
11.1%
l110650
 
11.0%
v109755
 
10.9%
r109174
 
10.9%
p2703
 
0.3%
a2448
 
0.2%
c1974
 
0.2%
n1933
 
0.2%
Other values (7)5312
 
0.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1005532
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e330187
32.8%
d219778
21.9%
i111618
 
11.1%
l110650
 
11.0%
v109755
 
10.9%
r109174
 
10.9%
p2703
 
0.3%
a2448
 
0.2%
c1974
 
0.2%
n1933
 
0.2%
Other values (7)5312
 
0.5%

Most occurring scripts

ValueCountFrequency (%)
Latin1005532
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e330187
32.8%
d219778
21.9%
i111618
 
11.1%
l110650
 
11.0%
v109755
 
10.9%
r109174
 
10.9%
p2703
 
0.3%
a2448
 
0.2%
c1974
 
0.2%
n1933
 
0.2%
Other values (7)5312
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII1005532
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e330187
32.8%
d219778
21.9%
i111618
 
11.1%
l110650
 
11.0%
v109755
 
10.9%
r109174
 
10.9%
p2703
 
0.3%
a2448
 
0.2%
c1974
 
0.2%
n1933
 
0.2%
Other values (7)5312
 
0.5%
Distinct592
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size874.5 KiB
Minimum2017-01-06 00:00:00
Maximum2018-08-20 00:00:00
2022-09-01T14:40:34.375860image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-01T14:40:34.722792image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Distinct591
Distinct (%)0.5%
Missing115
Missing (%)0.1%
Memory size874.5 KiB
Minimum2017-01-06 00:00:00
Maximum2018-08-24 00:00:00
2022-09-01T14:40:35.061711image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-01T14:40:35.419631image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Distinct509
Distinct (%)0.5%
Missing1873
Missing (%)1.7%
Memory size874.5 KiB
Minimum2017-01-09 00:00:00
Maximum2018-09-11 00:00:00
2022-09-01T14:40:35.820669image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-01T14:40:36.192990image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Distinct603
Distinct (%)0.6%
Missing3119
Missing (%)2.8%
Memory size874.5 KiB
Minimum2017-01-12 00:00:00
Maximum2018-10-17 00:00:00
2022-09-01T14:40:36.536220image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-01T14:40:36.912138image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Distinct418
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size874.5 KiB
Minimum2017-02-07 00:00:00
Maximum2018-10-25 00:00:00
2022-09-01T14:40:37.245993image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-01T14:40:37.591792image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

order_item_id
Real number (ℝ≥0)

Distinct21
Distinct (%)< 0.1%
Missing717
Missing (%)0.6%
Infinite0
Infinite (%)0.0%
Mean1.198019891
Minimum1
Maximum21
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size874.5 KiB
2022-09-01T14:40:37.892015image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q31
95-th percentile2
Maximum21
Range20
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.7064445674
Coefficient of variation (CV)0.5896768265
Kurtosis104.2594651
Mean1.198019891
Median Absolute Deviation (MAD)0
Skewness7.601320234
Sum133227
Variance0.4990639268
MonotonicityNot monotonic
2022-09-01T14:40:38.158057image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=21)
ValueCountFrequency (%)
197398
87.0%
29677
 
8.6%
32261
 
2.0%
4952
 
0.9%
5453
 
0.4%
6252
 
0.2%
758
 
0.1%
836
 
< 0.1%
928
 
< 0.1%
1025
 
< 0.1%
Other values (11)66
 
0.1%
(Missing)717
 
0.6%
ValueCountFrequency (%)
197398
87.0%
29677
 
8.6%
32261
 
2.0%
4952
 
0.9%
5453
 
0.4%
6252
 
0.2%
758
 
0.1%
836
 
< 0.1%
928
 
< 0.1%
1025
 
< 0.1%
ValueCountFrequency (%)
211
 
< 0.1%
203
 
< 0.1%
193
 
< 0.1%
183
 
< 0.1%
173
 
< 0.1%
163
 
< 0.1%
155
 
< 0.1%
147
< 0.1%
138
< 0.1%
1213
< 0.1%

product_id
Categorical

HIGH CARDINALITY

Distinct32486
Distinct (%)29.2%
Missing717
Missing (%)0.6%
Memory size874.5 KiB
aca2eb7d00ea1a7b8ebd4e68314663af
 
527
99a4788cb24856965c36a24e339b6058
 
488
422879e10f46682990de24d770e7f83d
 
484
389d119b48cf3043d311335e499d9c6b
 
392
368c6c730842d78016ad823897a372db
 
387
Other values (32481)
108928 

Length

Max length32
Median length32
Mean length32
Min length32

Characters and Unicode

Total characters3558592
Distinct characters16
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique17864 ?
Unique (%)16.1%

Sample

1st row87285b34884572647811a353c7ac498a
2nd row595fac2a385ac33a80bd5114aec74eb8
3rd rowaa4383b373c6aca5d8797843e5594415
4th rowd0b61bfb1de832b15ba9d266ca96e5b0
5th row65266b2da20d04dbe00c5c2d3bb7859e

Common Values

ValueCountFrequency (%)
aca2eb7d00ea1a7b8ebd4e68314663af527
 
0.5%
99a4788cb24856965c36a24e339b6058488
 
0.4%
422879e10f46682990de24d770e7f83d484
 
0.4%
389d119b48cf3043d311335e499d9c6b392
 
0.4%
368c6c730842d78016ad823897a372db387
 
0.3%
53759a2ecddad2bb87a079a1f1519f73370
 
0.3%
d1c427060a0f73f6b889a5c7c61f2ac4342
 
0.3%
53b36df67ebb7c41585e8d54d6772e08323
 
0.3%
154e7e31ebfa092203795c972e5804a6281
 
0.3%
3dd2a17168ec895c781a9191c1e95ad7274
 
0.2%
Other values (32476)107338
95.9%
(Missing)717
 
0.6%

Length

2022-09-01T14:40:38.444143image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
aca2eb7d00ea1a7b8ebd4e68314663af527
 
0.5%
99a4788cb24856965c36a24e339b6058488
 
0.4%
422879e10f46682990de24d770e7f83d484
 
0.4%
389d119b48cf3043d311335e499d9c6b392
 
0.4%
368c6c730842d78016ad823897a372db387
 
0.3%
53759a2ecddad2bb87a079a1f1519f73370
 
0.3%
d1c427060a0f73f6b889a5c7c61f2ac4342
 
0.3%
53b36df67ebb7c41585e8d54d6772e08323
 
0.3%
154e7e31ebfa092203795c972e5804a6281
 
0.3%
3dd2a17168ec895c781a9191c1e95ad7274
 
0.2%
Other values (32476)107338
96.5%

Most occurring characters

ValueCountFrequency (%)
3228870
 
6.4%
9226536
 
6.4%
e224669
 
6.3%
7224186
 
6.3%
8223825
 
6.3%
4223416
 
6.3%
a222982
 
6.3%
c222118
 
6.2%
0222106
 
6.2%
2222081
 
6.2%
Other values (6)1317803
37.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2233846
62.8%
Lowercase Letter1324746
37.2%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3228870
10.2%
9226536
10.1%
7224186
10.0%
8223825
10.0%
4223416
10.0%
0222106
9.9%
2222081
9.9%
5221504
9.9%
6221108
9.9%
1220214
9.9%
Lowercase Letter
ValueCountFrequency (%)
e224669
17.0%
a222982
16.8%
c222118
16.8%
b220702
16.7%
d218679
16.5%
f215596
16.3%

Most occurring scripts

ValueCountFrequency (%)
Common2233846
62.8%
Latin1324746
37.2%

Most frequent character per script

Common
ValueCountFrequency (%)
3228870
10.2%
9226536
10.1%
7224186
10.0%
8223825
10.0%
4223416
10.0%
0222106
9.9%
2222081
9.9%
5221504
9.9%
6221108
9.9%
1220214
9.9%
Latin
ValueCountFrequency (%)
e224669
17.0%
a222982
16.8%
c222118
16.8%
b220702
16.7%
d218679
16.5%
f215596
16.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII3558592
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3228870
 
6.4%
9226536
 
6.4%
e224669
 
6.3%
7224186
 
6.3%
8223825
 
6.3%
4223416
 
6.3%
a222982
 
6.3%
c222118
 
6.2%
0222106
 
6.2%
2222081
 
6.2%
Other values (6)1317803
37.0%

seller_id
Categorical

HIGH CARDINALITY

Distinct3050
Distinct (%)2.7%
Missing717
Missing (%)0.6%
Memory size874.5 KiB
6560211a19b47992c3666cc44a7e94c0
 
2000
4a3ca9315b744ce9f8e9374361493884
 
1982
1f50f920176fa81dab994f9023523100
 
1923
cc419e0650a3c5ba77189a1882b7556a
 
1766
da8622b14eb17ae2831f4ac5b9dab84a
 
1534
Other values (3045)
102001 

Length

Max length32
Median length32
Mean length32
Min length32

Characters and Unicode

Total characters3558592
Distinct characters16
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique497 ?
Unique (%)0.4%

Sample

1st row3504c0cb71d7fa48d967e0e4c94d59d9
2nd row289cdb325fb7e7f891c38608bf9e0962
3rd row4869f7a5dfa277a7dca6462dcf3b52b2
4th row66922902710d126a0e7d26b0e3805106
5th row2c9e548be18521d1c43cde1c582c6de8

Common Values

ValueCountFrequency (%)
6560211a19b47992c3666cc44a7e94c02000
 
1.8%
4a3ca9315b744ce9f8e93743614938841982
 
1.8%
1f50f920176fa81dab994f90235231001923
 
1.7%
cc419e0650a3c5ba77189a1882b7556a1766
 
1.6%
da8622b14eb17ae2831f4ac5b9dab84a1534
 
1.4%
955fee9216a65b617aa5c0531780ce601483
 
1.3%
1025f0e2d44d7041d6cf58b6550e0bfa1422
 
1.3%
7c67e1448b00f6e969d365cea6b010ab1364
 
1.2%
ea8482cd71df3c1969d7b9473ff13abc1200
 
1.1%
7a67c85e85bb2ce8582c35f2203ad7361170
 
1.0%
Other values (3040)95362
85.2%

Length

2022-09-01T14:40:38.667811image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
6560211a19b47992c3666cc44a7e94c02000
 
1.8%
4a3ca9315b744ce9f8e93743614938841982
 
1.8%
1f50f920176fa81dab994f90235231001923
 
1.7%
cc419e0650a3c5ba77189a1882b7556a1766
 
1.6%
da8622b14eb17ae2831f4ac5b9dab84a1534
 
1.4%
955fee9216a65b617aa5c0531780ce601483
 
1.3%
1025f0e2d44d7041d6cf58b6550e0bfa1422
 
1.3%
7c67e1448b00f6e969d365cea6b010ab1364
 
1.2%
ea8482cd71df3c1969d7b9473ff13abc1200
 
1.1%
7a67c85e85bb2ce8582c35f2203ad7361170
 
1.1%
Other values (3040)95362
85.8%

Most occurring characters

ValueCountFrequency (%)
1241277
 
6.8%
c234639
 
6.6%
4233185
 
6.6%
6229166
 
6.4%
0228229
 
6.4%
a226825
 
6.4%
3226197
 
6.4%
b226194
 
6.4%
9220774
 
6.2%
2220068
 
6.2%
Other values (6)1272038
35.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2250830
63.3%
Lowercase Letter1307762
36.7%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1241277
10.7%
4233185
10.4%
6229166
10.2%
0228229
10.1%
3226197
10.0%
9220774
9.8%
2220068
9.8%
5217495
9.7%
8217424
9.7%
7217015
9.6%
Lowercase Letter
ValueCountFrequency (%)
c234639
17.9%
a226825
17.3%
b226194
17.3%
e209539
16.0%
f206299
15.8%
d204266
15.6%

Most occurring scripts

ValueCountFrequency (%)
Common2250830
63.3%
Latin1307762
36.7%

Most frequent character per script

Common
ValueCountFrequency (%)
1241277
10.7%
4233185
10.4%
6229166
10.2%
0228229
10.1%
3226197
10.0%
9220774
9.8%
2220068
9.8%
5217495
9.7%
8217424
9.7%
7217015
9.6%
Latin
ValueCountFrequency (%)
c234639
17.9%
a226825
17.3%
b226194
17.3%
e209539
16.0%
f206299
15.8%
d204266
15.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII3558592
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1241277
 
6.8%
c234639
 
6.6%
4233185
 
6.6%
6229166
 
6.4%
0228229
 
6.4%
a226825
 
6.4%
3226197
 
6.4%
b226194
 
6.4%
9220774
 
6.2%
2220068
 
6.2%
Other values (6)1272038
35.7%
Distinct528
Distinct (%)0.5%
Missing717
Missing (%)0.6%
Memory size874.5 KiB
Minimum2017-01-10 00:00:00
Maximum2020-04-09 00:00:00
2022-09-01T14:40:38.943339image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-01T14:40:39.303741image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

price
Real number (ℝ≥0)

Distinct5914
Distinct (%)5.3%
Missing717
Missing (%)0.6%
Infinite0
Infinite (%)0.0%
Mean120.8459546
Minimum0.85
Maximum6735
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size874.5 KiB
2022-09-01T14:40:39.654794image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum0.85
5-th percentile17.37
Q139.9
median74.99
Q3134.99
95-th percentile349.9
Maximum6735
Range6734.15
Interquartile range (IQR)95.09

Descriptive statistics

Standard deviation183.9417558
Coefficient of variation (CV)1.522117611
Kurtosis121.1401143
Mean120.8459546
Median Absolute Deviation (MAD)42.09
Skewness7.933903095
Sum13438795.23
Variance33834.56953
MonotonicityNot monotonic
2022-09-01T14:40:39.972114image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
59.92446
 
2.2%
69.91975
 
1.8%
49.91919
 
1.7%
89.91536
 
1.4%
99.91418
 
1.3%
39.91318
 
1.2%
29.91301
 
1.2%
79.91196
 
1.1%
19.91181
 
1.1%
29.991162
 
1.0%
Other values (5904)95754
85.6%
ValueCountFrequency (%)
0.853
 
< 0.1%
1.220
< 0.1%
2.291
 
< 0.1%
2.991
 
< 0.1%
32
 
< 0.1%
3.063
 
< 0.1%
3.493
 
< 0.1%
3.57
 
< 0.1%
3.541
 
< 0.1%
3.853
 
< 0.1%
ValueCountFrequency (%)
67351
< 0.1%
67291
< 0.1%
64991
< 0.1%
47991
< 0.1%
46901
< 0.1%
45901
< 0.1%
4399.871
< 0.1%
4099.991
< 0.1%
40591
< 0.1%
3999.91
< 0.1%

freight_value
Real number (ℝ≥0)

Distinct6977
Distinct (%)6.3%
Missing717
Missing (%)0.6%
Infinite0
Infinite (%)0.0%
Mean20.03207893
Minimum0
Maximum409.68
Zeros381
Zeros (%)0.3%
Negative0
Negative (%)0.0%
Memory size874.5 KiB
2022-09-01T14:40:40.337643image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile7.78
Q113.08
median16.28
Q321.18
95-th percentile45.2
Maximum409.68
Range409.68
Interquartile range (IQR)8.1

Descriptive statistics

Standard deviation15.8479909
Coefficient of variation (CV)0.7911306139
Kurtosis59.74854478
Mean20.03207893
Median Absolute Deviation (MAD)3.59
Skewness5.64120452
Sum2227687.37
Variance251.1588157
MonotonicityNot monotonic
2022-09-01T14:40:40.667983image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
15.13707
 
3.3%
7.782260
 
2.0%
14.11875
 
1.7%
11.851845
 
1.6%
18.231561
 
1.4%
7.391476
 
1.3%
16.111151
 
1.0%
15.231005
 
0.9%
8.72885
 
0.8%
16.79872
 
0.8%
Other values (6967)94569
84.5%
ValueCountFrequency (%)
0381
0.3%
0.014
 
< 0.1%
0.023
 
< 0.1%
0.0314
 
< 0.1%
0.044
 
< 0.1%
0.054
 
< 0.1%
0.0611
 
< 0.1%
0.071
 
< 0.1%
0.0812
 
< 0.1%
0.096
 
< 0.1%
ValueCountFrequency (%)
409.681
< 0.1%
375.282
< 0.1%
339.591
< 0.1%
338.31
< 0.1%
322.11
< 0.1%
321.881
< 0.1%
321.461
< 0.1%
317.471
< 0.1%
314.41
< 0.1%
314.021
< 0.1%

product_category_name
Categorical

HIGH CARDINALITY
MISSING

Distinct73
Distinct (%)0.1%
Missing2315
Missing (%)2.1%
Memory size874.5 KiB
cama_mesa_banho
11016 
beleza_saude
9487 
esporte_lazer
8541 
moveis_decoracao
8223 
informatica_acessorios
7752 
Other values (68)
64589 

Length

Max length46
Median length32
Mean length14.8608587
Min length3

Characters and Unicode

Total characters1628869
Distinct characters28
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowutilidades_domesticas
2nd rowperfumaria
3rd rowautomotivo
4th rowpet_shop
5th rowpapelaria

Common Values

ValueCountFrequency (%)
cama_mesa_banho11016
 
9.8%
beleza_saude9487
 
8.5%
esporte_lazer8541
 
7.6%
moveis_decoracao8223
 
7.3%
informatica_acessorios7752
 
6.9%
utilidades_domesticas6840
 
6.1%
relogios_presentes5932
 
5.3%
telefonia4501
 
4.0%
ferramentas_jardim4325
 
3.9%
automotivo4174
 
3.7%
Other values (63)38817
34.7%

Length

2022-09-01T14:40:41.061501image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
cama_mesa_banho11016
 
10.1%
beleza_saude9487
 
8.7%
esporte_lazer8541
 
7.8%
moveis_decoracao8223
 
7.5%
informatica_acessorios7752
 
7.1%
utilidades_domesticas6840
 
6.2%
relogios_presentes5932
 
5.4%
telefonia4501
 
4.1%
ferramentas_jardim4325
 
3.9%
automotivo4174
 
3.8%
Other values (63)38817
35.4%

Most occurring characters

ValueCountFrequency (%)
e198340
12.2%
a195308
12.0%
s162363
10.0%
o161280
9.9%
i108297
 
6.6%
r105230
 
6.5%
_103261
 
6.3%
t78568
 
4.8%
c77136
 
4.7%
m73290
 
4.5%
Other values (18)365796
22.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1525341
93.6%
Connector Punctuation103261
 
6.3%
Decimal Number267
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e198340
13.0%
a195308
12.8%
s162363
10.6%
o161280
10.6%
i108297
 
7.1%
r105230
 
6.9%
t78568
 
5.2%
c77136
 
5.1%
m73290
 
4.8%
n55685
 
3.7%
Other values (16)309844
20.3%
Connector Punctuation
ValueCountFrequency (%)
_103261
100.0%
Decimal Number
ValueCountFrequency (%)
2267
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1525341
93.6%
Common103528
 
6.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
e198340
13.0%
a195308
12.8%
s162363
10.6%
o161280
10.6%
i108297
 
7.1%
r105230
 
6.9%
t78568
 
5.2%
c77136
 
5.1%
m73290
 
4.8%
n55685
 
3.7%
Other values (16)309844
20.3%
Common
ValueCountFrequency (%)
_103261
99.7%
2267
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII1628869
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e198340
12.2%
a195308
12.0%
s162363
10.0%
o161280
9.9%
i108297
 
6.6%
r105230
 
6.5%
_103261
 
6.3%
t78568
 
4.8%
c77136
 
4.7%
m73290
 
4.5%
Other values (18)365796
22.5%

customer_unique_id
Categorical

HIGH CARDINALITY
UNIFORM

Distinct94855
Distinct (%)84.8%
Missing0
Missing (%)0.0%
Memory size874.5 KiB
c8460e4251689ba205045f3ea17884a1
 
24
4546caea018ad8c692964e3382debd19
 
21
c402f431464c72e27330a67f7b94d4fb
 
20
698e1cf81d01a3d389d96145f7fa6df8
 
20
0f5ac8d5c31de21d2f25e24be15bbffb
 
18
Other values (94850)
111820 

Length

Max length32
Median length32
Mean length32
Min length32

Characters and Unicode

Total characters3581536
Distinct characters16
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique83088 ?
Unique (%)74.2%

Sample

1st row7c396fd4830fd04220f754e42b4e5bff
2nd rowaf07308b275d755c9edb36a90c618231
3rd row3a653a41f6f9fc3d2a113cf8398680e8
4th row7c142cf63193a1473d2e66489a9ae977
5th row72632f0f9dd73dfee390c9b22eb56dd6

Common Values

ValueCountFrequency (%)
c8460e4251689ba205045f3ea17884a124
 
< 0.1%
4546caea018ad8c692964e3382debd1921
 
< 0.1%
c402f431464c72e27330a67f7b94d4fb20
 
< 0.1%
698e1cf81d01a3d389d96145f7fa6df820
 
< 0.1%
0f5ac8d5c31de21d2f25e24be15bbffb18
 
< 0.1%
8d50f5eadf50201ccdcedfb9e2ac845517
 
< 0.1%
11f97da02237a49c8e783dfda6f50e8e15
 
< 0.1%
eae0a83d752b1dd32697e0e7b422165615
 
< 0.1%
3e43e6105506432c953e165fb2acf44c14
 
< 0.1%
f7ea4eef770a388bd5b225acfc54660414
 
< 0.1%
Other values (94845)111745
99.8%

Length

2022-09-01T14:40:41.327731image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
c8460e4251689ba205045f3ea17884a124
 
< 0.1%
4546caea018ad8c692964e3382debd1921
 
< 0.1%
c402f431464c72e27330a67f7b94d4fb20
 
< 0.1%
698e1cf81d01a3d389d96145f7fa6df820
 
< 0.1%
0f5ac8d5c31de21d2f25e24be15bbffb18
 
< 0.1%
8d50f5eadf50201ccdcedfb9e2ac845517
 
< 0.1%
11f97da02237a49c8e783dfda6f50e8e15
 
< 0.1%
eae0a83d752b1dd32697e0e7b422165615
 
< 0.1%
31e412b9fb766b6794724ed17a41dfa614
 
< 0.1%
3e43e6105506432c953e165fb2acf44c14
 
< 0.1%
Other values (94845)111745
99.8%

Most occurring characters

ValueCountFrequency (%)
6224540
 
6.3%
1224529
 
6.3%
e224189
 
6.3%
8224183
 
6.3%
a224044
 
6.3%
9224023
 
6.3%
d223980
 
6.3%
5223931
 
6.3%
3223876
 
6.3%
0223850
 
6.3%
Other values (6)1340391
37.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2239049
62.5%
Lowercase Letter1342487
37.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
6224540
10.0%
1224529
10.0%
8224183
10.0%
9224023
10.0%
5223931
10.0%
3223876
10.0%
0223850
10.0%
2223780
10.0%
7223383
10.0%
4222954
10.0%
Lowercase Letter
ValueCountFrequency (%)
e224189
16.7%
a224044
16.7%
d223980
16.7%
b223778
16.7%
f223373
16.6%
c223123
16.6%

Most occurring scripts

ValueCountFrequency (%)
Common2239049
62.5%
Latin1342487
37.5%

Most frequent character per script

Common
ValueCountFrequency (%)
6224540
10.0%
1224529
10.0%
8224183
10.0%
9224023
10.0%
5223931
10.0%
3223876
10.0%
0223850
10.0%
2223780
10.0%
7223383
10.0%
4222954
10.0%
Latin
ValueCountFrequency (%)
e224189
16.7%
a224044
16.7%
d223980
16.7%
b223778
16.7%
f223373
16.6%
c223123
16.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII3581536
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
6224540
 
6.3%
1224529
 
6.3%
e224189
 
6.3%
8224183
 
6.3%
a224044
 
6.3%
9224023
 
6.3%
d223980
 
6.3%
5223931
 
6.3%
3223876
 
6.3%
0223850
 
6.3%
Other values (6)1340391
37.4%

customer_city
Categorical

HIGH CARDINALITY

Distinct4111
Distinct (%)3.7%
Missing0
Missing (%)0.0%
Memory size874.5 KiB
sao paulo
17590 
rio de janeiro
 
7783
belo horizonte
 
3117
brasilia
 
2380
curitiba
 
1703
Other values (4106)
79350 

Length

Max length32
Median length27
Mean length10.33843803
Min length3

Characters and Unicode

Total characters1157109
Distinct characters31
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1060 ?
Unique (%)0.9%

Sample

1st rowsao paulo
2nd rowbarreiras
3rd rowvianopolis
4th rowsao goncalo do amarante
5th rowsanto andre

Common Values

ValueCountFrequency (%)
sao paulo17590
 
15.7%
rio de janeiro7783
 
7.0%
belo horizonte3117
 
2.8%
brasilia2380
 
2.1%
curitiba1703
 
1.5%
campinas1633
 
1.5%
porto alegre1601
 
1.4%
salvador1415
 
1.3%
guarulhos1313
 
1.2%
sao bernardo do campo1047
 
0.9%
Other values (4101)72341
64.6%

Length

2022-09-01T14:40:41.596023image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
sao23726
 
12.1%
paulo17670
 
9.0%
de10904
 
5.6%
rio9376
 
4.8%
janeiro7783
 
4.0%
do4835
 
2.5%
belo3186
 
1.6%
horizonte3145
 
1.6%
brasilia2390
 
1.2%
porto1910
 
1.0%
Other values (3279)111242
56.7%

Most occurring characters

ValueCountFrequency (%)
a190734
16.5%
o142753
12.3%
i88671
 
7.7%
r86013
 
7.4%
84244
 
7.3%
e75242
 
6.5%
s70677
 
6.1%
n51406
 
4.4%
u50594
 
4.4%
l50440
 
4.4%
Other values (21)266335
23.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1072352
92.7%
Space Separator84244
 
7.3%
Dash Punctuation260
 
< 0.1%
Other Punctuation251
 
< 0.1%
Decimal Number2
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a190734
17.8%
o142753
13.3%
i88671
 
8.3%
r86013
 
8.0%
e75242
 
7.0%
s70677
 
6.6%
n51406
 
4.8%
u50594
 
4.7%
l50440
 
4.7%
p41966
 
3.9%
Other values (16)223856
20.9%
Decimal Number
ValueCountFrequency (%)
11
50.0%
41
50.0%
Space Separator
ValueCountFrequency (%)
84244
100.0%
Dash Punctuation
ValueCountFrequency (%)
-260
100.0%
Other Punctuation
ValueCountFrequency (%)
'251
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1072352
92.7%
Common84757
 
7.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
a190734
17.8%
o142753
13.3%
i88671
 
8.3%
r86013
 
8.0%
e75242
 
7.0%
s70677
 
6.6%
n51406
 
4.8%
u50594
 
4.7%
l50440
 
4.7%
p41966
 
3.9%
Other values (16)223856
20.9%
Common
ValueCountFrequency (%)
84244
99.4%
-260
 
0.3%
'251
 
0.3%
11
 
< 0.1%
41
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII1157109
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a190734
16.5%
o142753
12.3%
i88671
 
7.7%
r86013
 
7.4%
84244
 
7.3%
e75242
 
6.5%
s70677
 
6.1%
n51406
 
4.4%
u50594
 
4.4%
l50440
 
4.4%
Other values (21)266335
23.0%

customer_state
Categorical

Distinct27
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size874.5 KiB
SP
46972 
RJ
14509 
MG
13068 
RS
6211 
PR
5685 
Other values (22)
25478 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters223846
Distinct characters17
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSP
2nd rowBA
3rd rowGO
4th rowRN
5th rowSP

Common Values

ValueCountFrequency (%)
SP46972
42.0%
RJ14509
 
13.0%
MG13068
 
11.7%
RS6211
 
5.5%
PR5685
 
5.1%
SC4161
 
3.7%
BA3811
 
3.4%
DF2394
 
2.1%
GO2329
 
2.1%
ES2250
 
2.0%
Other values (17)10533
 
9.4%

Length

2022-09-01T14:40:41.858898image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
sp46972
42.0%
rj14509
 
13.0%
mg13068
 
11.7%
rs6211
 
5.5%
pr5685
 
5.1%
sc4161
 
3.7%
ba3811
 
3.4%
df2394
 
2.1%
go2329
 
2.1%
es2250
 
2.0%
Other values (17)10533
 
9.4%

Most occurring characters

ValueCountFrequency (%)
S60805
27.2%
P56759
25.4%
R27314
12.2%
M15925
 
7.1%
G15397
 
6.9%
J14509
 
6.5%
A6492
 
2.9%
E5911
 
2.6%
C5728
 
2.6%
B4413
 
2.0%
Other values (7)10593
 
4.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter223846
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S60805
27.2%
P56759
25.4%
R27314
12.2%
M15925
 
7.1%
G15397
 
6.9%
J14509
 
6.5%
A6492
 
2.9%
E5911
 
2.6%
C5728
 
2.6%
B4413
 
2.0%
Other values (7)10593
 
4.7%

Most occurring scripts

ValueCountFrequency (%)
Latin223846
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
S60805
27.2%
P56759
25.4%
R27314
12.2%
M15925
 
7.1%
G15397
 
6.9%
J14509
 
6.5%
A6492
 
2.9%
E5911
 
2.6%
C5728
 
2.6%
B4413
 
2.0%
Other values (7)10593
 
4.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII223846
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
S60805
27.2%
P56759
25.4%
R27314
12.2%
M15925
 
7.1%
G15397
 
6.9%
J14509
 
6.5%
A6492
 
2.9%
E5911
 
2.6%
C5728
 
2.6%
B4413
 
2.0%
Other values (7)10593
 
4.7%

dia_compra
Real number (ℝ≥0)

Distinct31
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15.51285259
Minimum1
Maximum31
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size874.5 KiB
2022-09-01T14:40:42.103825image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q18
median15
Q323
95-th percentile29
Maximum31
Range30
Interquartile range (IQR)15

Descriptive statistics

Standard deviation8.667442147
Coefficient of variation (CV)0.5587265204
Kurtosis-1.171315268
Mean15.51285259
Median Absolute Deviation (MAD)8
Skewness0.02641766669
Sum1736245
Variance75.12455336
MonotonicityNot monotonic
2022-09-01T14:40:42.361888image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=31)
ValueCountFrequency (%)
244302
 
3.8%
164089
 
3.7%
153983
 
3.6%
63908
 
3.5%
183901
 
3.5%
193881
 
3.5%
53871
 
3.5%
143831
 
3.4%
43815
 
3.4%
73761
 
3.4%
Other values (21)72581
64.8%
ValueCountFrequency (%)
13503
3.1%
23655
3.3%
33678
3.3%
43815
3.4%
53871
3.5%
63908
3.5%
73761
3.4%
83727
3.3%
93671
3.3%
103533
3.2%
ValueCountFrequency (%)
311925
1.7%
302877
2.6%
292917
2.6%
283365
3.0%
273570
3.2%
263720
3.3%
253706
3.3%
244302
3.8%
233462
3.1%
223488
3.1%

mes_compra
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct12
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.000670104
Minimum1
Maximum12
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size874.5 KiB
2022-09-01T14:40:42.611835image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q13
median6
Q38
95-th percentile12
Maximum12
Range11
Interquartile range (IQR)5

Descriptive statistics

Standard deviation3.241051271
Coefficient of variation (CV)0.5401148896
Kurtosis-0.9756885162
Mean6.000670104
Median Absolute Deviation (MAD)2
Skewness0.2285576007
Sum671613
Variance10.50441334
MonotonicityNot monotonic
2022-09-01T14:40:42.825246image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
512121
10.8%
711687
10.4%
311281
10.1%
811193
10.0%
610696
9.6%
410677
9.5%
29704
8.7%
19191
8.2%
118758
7.8%
126357
5.7%
Other values (2)10258
9.2%
ValueCountFrequency (%)
19191
8.2%
29704
8.7%
311281
10.1%
410677
9.5%
512121
10.8%
610696
9.6%
711687
10.4%
811193
10.0%
94873
4.4%
105385
4.8%
ValueCountFrequency (%)
126357
5.7%
118758
7.8%
105385
4.8%
94873
4.4%
811193
10.0%
711687
10.4%
610696
9.6%
512121
10.8%
410677
9.5%
311281
10.1%

ano_compra
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size874.5 KiB
2018
60569 
2017
51354 

Length

Max length4
Median length4
Mean length4
Min length4

Characters and Unicode

Total characters447692
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2017
2nd row2018
3rd row2018
4th row2017
5th row2018

Common Values

ValueCountFrequency (%)
201860569
54.1%
201751354
45.9%

Length

2022-09-01T14:40:43.060235image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-09-01T14:40:43.287712image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
201860569
54.1%
201751354
45.9%

Most occurring characters

ValueCountFrequency (%)
2111923
25.0%
0111923
25.0%
1111923
25.0%
860569
13.5%
751354
11.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number447692
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2111923
25.0%
0111923
25.0%
1111923
25.0%
860569
13.5%
751354
11.5%

Most occurring scripts

ValueCountFrequency (%)
Common447692
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2111923
25.0%
0111923
25.0%
1111923
25.0%
860569
13.5%
751354
11.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII447692
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2111923
25.0%
0111923
25.0%
1111923
25.0%
860569
13.5%
751354
11.5%

ano_mes
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct20
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size874.5 KiB
2017-11
8758 
2018-01
8257 
2018-03
8240 
2018-04
7980 
2018-05
7945 
Other values (15)
70743 

Length

Max length7
Median length7
Mean length7
Min length7

Characters and Unicode

Total characters783461
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2017-10
2nd row2018-07
3rd row2018-08
4th row2017-11
5th row2018-02

Common Values

ValueCountFrequency (%)
2017-118758
 
7.8%
2018-018257
 
7.4%
2018-038240
 
7.4%
2018-047980
 
7.1%
2018-057945
 
7.1%
2018-027706
 
6.9%
2018-077111
 
6.4%
2018-067085
 
6.3%
2017-126357
 
5.7%
2018-086245
 
5.6%
Other values (10)36239
32.4%

Length

2022-09-01T14:40:43.488922image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2017-118758
 
7.8%
2018-018257
 
7.4%
2018-038240
 
7.4%
2018-047980
 
7.1%
2018-057945
 
7.1%
2018-027706
 
6.9%
2018-077111
 
6.4%
2018-067085
 
6.3%
2017-126357
 
5.7%
2018-086245
 
5.6%
Other values (10)36239
32.4%

Most occurring characters

ValueCountFrequency (%)
0208731
26.6%
1150372
19.2%
2127984
16.3%
-111923
14.3%
871762
 
9.2%
763041
 
8.0%
512121
 
1.5%
311281
 
1.4%
610696
 
1.4%
410677
 
1.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number671538
85.7%
Dash Punctuation111923
 
14.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0208731
31.1%
1150372
22.4%
2127984
19.1%
871762
 
10.7%
763041
 
9.4%
512121
 
1.8%
311281
 
1.7%
610696
 
1.6%
410677
 
1.6%
94873
 
0.7%
Dash Punctuation
ValueCountFrequency (%)
-111923
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common783461
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0208731
26.6%
1150372
19.2%
2127984
16.3%
-111923
14.3%
871762
 
9.2%
763041
 
8.0%
512121
 
1.5%
311281
 
1.4%
610696
 
1.4%
410677
 
1.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII783461
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0208731
26.6%
1150372
19.2%
2127984
16.3%
-111923
14.3%
871762
 
9.2%
763041
 
8.0%
512121
 
1.5%
311281
 
1.4%
610696
 
1.4%
410677
 
1.4%

Interactions

2022-09-01T14:40:25.272681image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-01T14:40:15.566675image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-01T14:40:20.250314image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-01T14:40:21.998862image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-01T14:40:23.613857image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-01T14:40:25.732363image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-01T14:40:18.911369image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-01T14:40:20.594593image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-01T14:40:22.340843image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-01T14:40:23.971831image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-01T14:40:26.188308image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-01T14:40:19.269057image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-01T14:40:21.023308image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-01T14:40:22.658969image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-01T14:40:24.301110image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-01T14:40:26.489778image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-01T14:40:19.590949image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-01T14:40:21.339866image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-01T14:40:22.973087image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-01T14:40:24.620993image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-01T14:40:26.795530image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-01T14:40:19.927154image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-01T14:40:21.672798image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-01T14:40:23.299696image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-01T14:40:24.937333image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Correlations

2022-09-01T14:40:43.688169image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-09-01T14:40:44.011650image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-09-01T14:40:44.352734image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-09-01T14:40:44.667781image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-09-01T14:40:44.970239image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-09-01T14:40:27.617826image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
A simple visualization of nullity by column.
2022-09-01T14:40:29.785748image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-09-01T14:40:31.460825image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2022-09-01T14:40:32.324822image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

order_idcustomer_idorder_statusorder_purchase_timestamporder_approved_atorder_delivered_carrier_dateorder_delivered_customer_dateorder_estimated_delivery_dateorder_item_idproduct_idseller_idshipping_limit_datepricefreight_valueproduct_category_namecustomer_unique_idcustomer_citycustomer_statedia_comprames_compraano_compraano_mes
0e481f51cbdc54678b7cc49136f2d6af79ef432eb6251297304e76186b10a928ddelivered2017-10-022017-10-022017-10-042017-10-102017-10-181.087285b34884572647811a353c7ac498a3504c0cb71d7fa48d967e0e4c94d59d92017-10-0629.998.72utilidades_domesticas7c396fd4830fd04220f754e42b4e5bffsao pauloSP21020172017-10
153cdb2fc8bc7dce0b6741e2150273451b0830fb4747a6c6d20dea0b8c802d7efdelivered2018-07-242018-07-262018-07-262018-08-072018-08-131.0595fac2a385ac33a80bd5114aec74eb8289cdb325fb7e7f891c38608bf9e09622018-07-30118.7022.76perfumariaaf07308b275d755c9edb36a90c618231barreirasBA24720182018-07
247770eb9100c2d0c44946d9cf07ec65d41ce2a54c0b03bf3443c3d931a367089delivered2018-08-082018-08-082018-08-082018-08-172018-09-041.0aa4383b373c6aca5d8797843e55944154869f7a5dfa277a7dca6462dcf3b52b22018-08-13159.9019.22automotivo3a653a41f6f9fc3d2a113cf8398680e8vianopolisGO8820182018-08
3949d5b44dbf5de918fe9c16f97b45f8af88197465ea7920adcdbec7375364d82delivered2017-11-182017-11-182017-11-222017-12-022017-12-151.0d0b61bfb1de832b15ba9d266ca96e5b066922902710d126a0e7d26b0e38051062017-11-2345.0027.20pet_shop7c142cf63193a1473d2e66489a9ae977sao goncalo do amaranteRN181120172017-11
4ad21c59c0840e6cb83a9ceb5573f81598ab97904e6daea8866dbdbc4fb7aad2cdelivered2018-02-132018-02-132018-02-142018-02-162018-02-261.065266b2da20d04dbe00c5c2d3bb7859e2c9e548be18521d1c43cde1c582c6de82018-02-1919.908.72papelaria72632f0f9dd73dfee390c9b22eb56dd6santo andreSP13220182018-02
5a4591c265e18cb1dcee52889e2d8acc3503740e9ca751ccdda7ba28e9ab8f608delivered2017-07-092017-07-092017-07-112017-07-262017-08-011.0060cb19345d90064d1015407193c233d8581055ce74af1daba164fdbd55a40de2017-07-13147.9027.36automotivo80bb27c7c16e8f973207a5086ab329e2congonhinhasPR9720172017-07
6136cce7faa42fdb2cefd53fdc79a6098ed0271e0b7da060a393796590e7b737ainvoiced2017-04-112017-04-13NaTNaT2017-05-091.0a1804276d9941ac0733cfd409f5206ebdc8798cbf453b7e0f98745e396cc56162017-04-1949.9016.05NaN36edbb3fb164b1f16485364b6fb04c73santa rosaRS11420172017-04
76514b8ad8028c9f2cc2374ded245783f9bdf08b4b3b52b5526ff42d37d47f222delivered2017-05-162017-05-162017-05-222017-05-262017-06-071.04520766ec412348b8d4caa5e8a18c46416090f2ca825584b5a147ab24aa30c862017-05-2259.9915.17automotivo932afa1e708222e5821dac9cd5db4caenilopolisRJ16520172017-05
876c6e866289321a7c93b82b54852dc33f54a9f0e6b351c431402b8461ea51999delivered2017-01-232017-01-252017-01-262017-02-022017-03-061.0ac1789e492dcd698c5c10b97a671243a63b9ae557efed31d1f7687917d248a8d2017-01-2719.9016.05moveis_decoracao39382392765b6dc74812866ee5ee92a7faxinalzinhoRS23120172017-01
9e69bfb5eb88e0ed6a785585b27e16dbf31ad1d1b63eb9962463f764d4e6e0c9ddelivered2017-07-292017-07-292017-08-102017-08-162017-08-231.09a78fb9862b10749a117f7fc3c31f0517c67e1448b00f6e969d365cea6b010ab2017-08-11149.9919.77moveis_escritorio299905e3934e9e181bfb2e164dd4b4f8sorocabaSP29720172017-07

Last rows

order_idcustomer_idorder_statusorder_purchase_timestamporder_approved_atorder_delivered_carrier_dateorder_delivered_customer_dateorder_estimated_delivery_dateorder_item_idproduct_idseller_idshipping_limit_datepricefreight_valueproduct_category_namecustomer_unique_idcustomer_citycustomer_statedia_comprames_compraano_compraano_mes
1119139115830be804184b91f5c00f6f49f92dda2124f134f5dfbce9d06f29bdb6c308delivered2017-10-042017-10-042017-10-052017-10-202017-11-071.0c982dbea53b864f4d27c1d36f14b60531caf283236cd69af44cbc09a0a1e7d322017-10-1042.110.80brinquedosc716cf2b5b86fb24257cffe9e7969df8cuiabaMT41020172017-10
1119149115830be804184b91f5c00f6f49f92dda2124f134f5dfbce9d06f29bdb6c308delivered2017-10-042017-10-042017-10-052017-10-202017-11-072.049d2e2460386273b195e7e59b43587c31caf283236cd69af44cbc09a0a1e7d322017-10-1026.9036.98brinquedosc716cf2b5b86fb24257cffe9e7969df8cuiabaMT41020172017-10
111915aa04ef5214580b06b10e2a378300db44f01a6bfcc730456317e4081fe0c9940edelivered2017-01-272017-01-272017-01-302017-02-072017-03-171.09fc063fd34fed29ccc57b7f8e8d03388ccc4bbb5f32a6ab2b7066a4130f114e32017-02-03370.0019.43beleza_saudee03dbdf5e56c96b106d8115ac336f47fdivinopolisMG27120172017-01
111916880675dff2150932f1601e1c07eadeeb47cd45a6ac7b9fb16537df2ccffeb5acdelivered2017-02-232017-02-232017-03-012017-03-062017-03-221.0ea73128566d1b082e5101ce46f8107c7391fc6631aebcf3004804e51b40bcf1e2017-02-27139.9016.09moveis_decoracao831ce3f1bacbd424fc4e38fbd4d66d29sao pauloSP23220172017-02
1119179c5dedf39a927c1b2549525ed64a053c39bd1228ee8140590ac3aca26f2dfe00delivered2017-03-092017-03-092017-03-102017-03-172017-03-281.0ac35486adb7b02598c182c2ff2e05254e24fc9fcd865784fb25705606fe3dfe72017-03-1572.0013.08beleza_saude6359f309b166b0196dbf7ad2ac62bb5asao jose dos camposSP9320172017-03
11191863943bddc261676b46f01ca7ac2f7bd81fca14ff2861355f6e5f14306ff977a7delivered2018-02-062018-02-062018-02-072018-02-282018-03-021.0f1d4ce8c6dd66c47bbaa8c6781c2a9231f9ab4708f3056ede07124aad39a25542018-02-12174.9020.10bebesda62f9e57a76d978d02ab5362c509660praia grandeSP6220182018-02
11191983c1379a015df1e13d02aae0204711ab1aa71eb042121263aafbe80c1b562c9cdelivered2017-08-272017-08-272017-08-282017-09-212017-09-271.0b80910977a37536adeddd63663f916add50d79cb34e38265a8649c383dcffd482017-09-05205.9965.02eletrodomesticos_2737520a9aad80b3fbbdad19b66b37b30nova vicosaBA27820172017-08
11192011c177c8e97725db2631073c19f07b62b331b74b18dc79bcdf6532d51e1637c1delivered2018-01-082018-01-082018-01-122018-01-252018-02-151.0d1c427060a0f73f6b889a5c7c61f2ac4a1043bafd471dff536d0c462352beb482018-01-12179.9940.59informatica_acessorios5097a5312c8b157bb7be58ae360ef43cjapuibaRJ8120182018-01
11192111c177c8e97725db2631073c19f07b62b331b74b18dc79bcdf6532d51e1637c1delivered2018-01-082018-01-082018-01-122018-01-252018-02-152.0d1c427060a0f73f6b889a5c7c61f2ac4a1043bafd471dff536d0c462352beb482018-01-12179.9940.59informatica_acessorios5097a5312c8b157bb7be58ae360ef43cjapuibaRJ8120182018-01
11192266dea50a8b16d9b4dee7af250b4be1a5edb027a75a1449115f6b43211ae02a24delivered2018-03-082018-03-092018-03-092018-03-162018-04-031.0006619bbed68b000c8ba3f8725d5409eececbfcff9804a2d6b40f589df8eef2b2018-03-1568.5018.36beleza_saude60350aa974b26ff12caad89e55993bd6lapaPR8320182018-03